Weighting and normalisation of synchronous HMMs for audio-visual speech recognition

نویسندگان

David Dean

Patrick Lucey

Sridha Sridharan

Tim Wark

چکیده

In this paper, we examine the effect of varying the stream weights in synchronous multi-stream hidden Markov models (HMMs) for audio-visual speech recognition. Rather than considering the stream weights to be the same for training and testing, we examine the effect of different stream weights for each task on the final speech-recognition performance. Evaluating our system under varying levels of audio and video degradation on the XM2VTS database, we show that the final performance is primarily a function of the choice of stream weight used in testing, and that the choice of stream weight used for training has a very minor corresponding effect. By varying the value of the testing stream weights we show that the best average speech recognition performance occurs with the streams weighted at around 80% audio and 20% video. However, by examining the distribution of frame-by-frame scores for each stream on a leftout section of the database, we show that these testing weights chosen primarily serve to normalise the two stream score distributions, rather than indicating the dependence of the final performance on either stream. By using a novel adaption of zero-normalisation to normalise each stream’s models before performing the weighted-fusion, we show that the actual contribution of the audio and video scores to the best performing speech system is closer to equal that appears to be indicated by the un-normalised stream weighting parameters alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database

This paper presents methods to improve speech recognition accuracy by incorporating automatic lip reading. The paper improves lip reading accu racy by following approaches; 1)collection of im age and speech synchronous data of 5240 words， 2)feature extraction of 2・dimensional power spect日 around a mouth and 3)sub-word unit HMMs with tied-mixture distribution(Tied-Mixture HMMs). Ex periments ...

متن کامل

Product HMMs for audio-visual continuous speech recognition using facial animation parameters

The use of visual information in addition to acoustic can improve automatic speech recognition. In this paper we compare different approaches for audio-visual information integration and show how they affect automatic speech recognition performance. We utilize Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation as visual features. We use both Singl...

متن کامل

Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition

A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper we show that for audio-visual speech recognition (AVSR), FHMMs can be adopted as a novel method of training synchronous MSHMMs. MSHMMs, as proposed by several authors for use in AVSR, are jointly trained on both the ...

متن کامل

Asynchrony modeling for audio-visual speech recognition

We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...

متن کامل

Fused HMM adaptation of synchronous HMMs for audio-visual speaker verification

A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper, we will show that instead of being treated as separate modelling technique, FHMMs can be adopted as a novel method of training synchronous hidden Markov models (SHMMs). SHMMs are traditionally jointly trained on bot...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Weighting and normalisation of synchronous HMMs for audio-visual speech recognition

نویسندگان

چکیده

منابع مشابه

Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database

Product HMMs for audio-visual continuous speech recognition using facial animation parameters

Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition

Asynchrony modeling for audio-visual speech recognition

Fused HMM adaptation of synchronous HMMs for audio-visual speaker verification

عنوان ژورنال:

اشتراک گذاری